Dominant Pathways in Protein Folding 
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We present a method to investigate the kinetics of protein folding on a long time-scale and the dynamics 
underlying the formation of secondary and tertiary structures during the entire reaction. The approach is based 
on the formal analogy between thermal and quantum diffusion: by writing the solution of the Fokker-Planck 
equation for the time-evolution of a protein in a viscous heat-bath in terms of a path integral, we derive a 
Hamilton-Jacobi variational principle from which we are able to compute the most probable pathway of folding. 
The method is applied to the folding of the Villin Headpiece Subdomain, in the framework of a Go-model. We 
have found that, in this model, the transition occurs through an initial collapsing phase driven by the starting 
coil configuration and a later rearrangement phase, in which secondary structures are formed and all computed 
paths display strong similarities. This method is completely general, does not require the prior knowledge of any 
reaction coordinate and represents an efficient tool to perfom ab-initio simulations of the entire folding process 
with available computers. 
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Understanding the kinetics of protein folding and the dy- 
namical mechanisms involved in the formation of their struc- 
tures in an all-atom approach involves simulating a statisti- 
cally significant ensemble of folding trajectories for a system 
of- 10 4 de grees of freedom. Unfortunately, the existence 
of a huge gap between the microscopic time-scale of the ro- 
tational degrees of freedom — 10~ 12 s and the macroscopic 
time scales of the full folding process ~ 1(T 6 - 10 1 s makes 
it extremely computationally challenging to follow the evo- 
lution of a typical — 100-residue protein for a time interval 
longer than few tens of nanoseconds. 

Several approaches have been proposed to overcome such 
computational difficulties and address the problem of identi- 
fying the relevant pathways of the folding reaction H. Un- 
fortunately these methods are either affected by uncontrolled 
systematic errors associated to ad-hoc approximations, or can 
only be applied to small proteins with typical folding time of 
the order of few nanoseconds (fast folders). In this Letter we 
present a novel approach to overcome these difficulties: we 
adopt the Langevin approach and devise a method to rigor- 
ously define and practically compute the most statistically rel- 
evant protein folding pathway. As a first exploratory applica- 
tion, we have studied the folding transition of the 36-monomer 
Villin Headpiece Subdomain (PDB code 1 VII). This molecule 
has been extensively studied in the literature because it is the 
smallest polypeptide that has all of the properties of a single 
domain protein and in addition, it is one of the fastest fold- 
ers |3I . The ribbon representation of this system is shown in 
Fig [2 We analyze the transition from different random self- 
avoiding coil states to the native state, whose structure was 
obtained from the Brookhaven Protein Data Bank. 

Our study is based on the analogy between Langevin dif- 
fusion and quantum propagation. Previous studies have ex- 
ploited such a connection to study a variety of diffusive prob- 
lems using path integral methods 0,0]. In this work we de- 



velop the formalism to determine explicitly the evolution of 
the position of each monomer of the protein, during the entire 
folding transition, without relying on a specific choice of the 
reaction coordinate. 

Before entering the details of our calculation it is con- 
venient to review the mathematical framework in a simple 
case. For this purpose, let us consider Langevin diffusion of a 
point-particle in one-dimension, subject to an external poten- 
tial U(x): 
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where T[(t) is a Gaussian noise with zero average and correla- 
tion given by (r)(f)ri(f / )} = 2Db(t — t'). In this equation, D is 
the diffusion constant of the particle in the solvent, kg and T 
are respectively the Boltzmann constant and the temperature. 

The probability to find the particle at position x at time t 
obeys the well-known Fokker-Planck Equation: 
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It is well-known that the stationary solution of (0 is the 
Boltzmann distribution P(x) — exp(—U(x)/kgT). The solu- 
tion of (0, subject to the boundary conditions x(t{) = X\ and 
x(tf) = Xf can be expressed in terms of a path-integral: 
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FIG. 1: Ribbon representation of the Villin Headpiece Subdomain, 
drawn using Raster3D ll2ll 

This result shows that the problem of studying the diffu- 
sion of a classical particle at temperature T in a medium with 
diffusion constant D can be mapped into the problem of de- 
termining its quantum-mechanical propagation in imaginary 
time, subject to the effective potential V e ff(x). This approach 
has substantial differences from the one introduced in Ref . 1 6 ] , 
where the second derivative of eq.(@} is neglected. Such an ap- 
proximation is not consistent with the Fokker-Planck equation 
(0, and it leads, at large times, to a distribution which is not 
the Boltzmann distribution [7]. Our approach also differs from 
the one introduced in Ref. [ 8 ] where thermal fluctuations were 
neglected and friction effects were partially accounted for by 
choosing large discretization steps to filter-out high-frequency 
modes. 

The most probable path contributing to (0 is the one for 
which the exponential weig ht e -w 2 ° is maximum, hence 
for which S e ff is minimum. A trajectory which connects 
configurations that are not classically accessible in the ab- 
sence of thermal fluctuations corresponds to an instanton in 
the quantum-mechanical language. 

The same framework can be applied to study the protein 
folding, in which the one-instanton solutions represent the 
most probable folding trajectories (which we shall refer to 
as Dominant Folding Pathway, DFP). Determining the DFP 
for realistic proteins using conventional methods — such as 
Molecular Dynamics — is extremely challenging from the 
computational point of view. In addition to the numerical dif- 
ficulties associated with the existence of very different time 
scales, one has also to face the solution of boundary-value 
problems, which are considerably harder than initial-value 
problems. 

Fortunately, a dramatic simplification is obtained upon ob- 
serving that the dynamics described by the effective action 
S e ff is energy-conserving and time-reversible. This property 
allows us to switch from the f/me-dependent Newtonian de- 
scription to the energy-dependent Hamilton-Jacobi (HJ) de- 
scription. We note that this could not be done at the level 
of the Langevin equations (or adopting the Onsager-Machlup 
action). In the HJ framework, the Dominant Folding Path- 
way connecting given initial and final positions is obtained 
by minimizing — not just extremizing — the target function 
(HJ functional) 

Shj = fj dl ^2{E eff + V eff [x{l)\), (5) 
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FIG. 2: The evolution of the radius of gyration as a function of the 
fraction of the total displacement covered during the folding transi- 
tions in 6 paths corresponding to different initial random coil config- 
urations. 

where dl is an infinitesimal displacement along the path tra- 
jectory. E e ff is a free parameter which determines the total 
time elapsed during the transition, according to: 

tf - ti= C dl {^^mr (6) 

It should be stressed that the conserved quantity E e ff does 
not correspond to the physical energy of the folding transi- 
tion (which is not conserved in the presence of random forces 
and friction). In principle, a statistical distribution of folding 
times can be obtained by modeling the statistical distribution 
of E e ff (for example through MD simulations). In the present 
work, we adopted the simple choice E e tf = —V e ff{xf), which 
corresponds to the longest folding time. However, we have 
noted that the minimization of the HJ action by varying the 
value of E e ff of a factor up to 5 leads to comparable results. 
The HJ formulation of the dynamics leads to an impressive 
computational simplification of this problem. In fact, the total 
Euclidean distance between the coil state and the native state 
of a typical protein is only 1-2 orders of magnitude larger than 
the most microscopic length scale, i.e. the typical monomer 
(or atom) size. As a consequence, only ~ 100 discretized dis- 
placement steps are sufficient for convergence. This number 
should be compared with 10 12 time-steps required in the time- 
dependent Newtonian description. As a result of this sim- 
plification, within our approach simulating the entire folding 
process for a typical protein becomes feasible with available 
computers. The physical reason why the HJ formulation is so 
much more efficient compared to the Newtonian formulation 
is the following: in traditional Molecular Dynamics simula- 
tions, proteins spend most of their time in meta-stable min- 
ima, trying to overcome free-energy barriers. The HJ formu- 
lation avoids investing computational times in such "waiting" 
phases by considering intervals of fixed displacements, rather 
than fixed time-length. The numerical advantages of the HJ 
formalism for describing long-time dynamics at constant en- 
ergy were first pointed-out in |8]. In this work, we show that 
comparable computational advantages can also be achieved 
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FIG. 3: The evolution of the percentage of monomers in alpha-helix 
conformation as a function of the fraction of the total displacement 
covered during the folding transitions in 6 paths corresponding to 
different initial random coil configurations. 
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FIG. 4: The evolution of the number of contacts as a function of the 
fraction of the total displacement covered during the folding transi- 
tions in 6 paths corresponding to different initial random coil config- 
urations. 



for stochastic dynamics at fixed temperature, in which the ef- 
fects associated with thermal fluctuations and dissipation are 
consistently taken into account. 

Let us now apply this formalism to the study of the kinetics 
of the protein folding. Although the ultimate goal is to char- 
acterize folding pathways using an all-atom description, in 
this exploratory study we test our method on a very schematic 
model in which the effective degrees of freedom (monomers) 
are representative of amino-acids, and have a fixed mass. The 
monomer-monomer interaction is chosen to be the sum of a 
harmonic bond along the chain, supplemented by a repulsive 
core between non-consecutive monomers and by an attractive 
basin between monomers which are in contact in the native 
state (Go-Model |9]). The detailed form of the potential used 
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where r,-,- = |x, ■ — x ; | and = 1 if i and j are in native con- 
tact, while Gjj — otherwise. The parameters in the potential 
have been chosen to be of the same order of similar Go-Model 
applications (see 11(111 and references therein): a = 0.38 nm, 
Ro = 0.45 nm , R, = 0.65 nm, e = 2 Kcal/mol. In this first 
exploratory study we chose to keep the problem as simple as 
possible and did not include Coulombic, angular or torsional 
interactions. Hence, the present simple model is not expected 
to be realistic in predicting the kinetics of tertiary structures 
formation: the collapse of the protein will be driven mostly by 
the boundary conditions. On the other hand, the Go-potential 
may be sufficiently long-ranged to be effective in the determi- 
nation of local secondary structures. 

The DFP was obtained minimizing numerically the dis- 



A/„ „ + i is the Euclidean measure of the n — th elementary path 
step and P is a penalty function which keeps all the length 
elements close to their average 1 8 ] and becomes irrelevant in 
the continuum limit. 

We have checked that, with 100 discretization steps, simu- 
lations performed on a wide range of X lead to consistent re- 
sults. The minimization of the discretized HJ effective action 
was performed applying an adaptive simulated annealing al- 
gorithm and using 50 and 100 path discretization steps. After 
a preliminary thermalization phase based on usual Metropolis 
algorithm, we performed about 5 cooling cycles, consisting of 
8000 cooling steps each. In order to avoid trapping in local 
minima, at the begin of each cooling cycle, the configuration 
was heated-up with few Metropolis steps. At the end of each 
cooling cycle, the boldness of the Monte Carlo moves was 
adapted, in order to keep the rejection rate ~ 90%. Each 
calculation lasted for approximately ~ 12 hours on a single- 
processor work station. We considered the folding transitions 
from 6 different random self-avoiding coil configurations to 
the same native state. The center of mass was subtracted form 
each configurations. 

The results of the simulations performed at T — 300 K 
and damping constant y = -fj- = 0.1ns are reported in 
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Fig. 12 Fig 13 and Fig|4] which show respectively the evolu- 
tion of the radius of gyration, the percentage of monomers 
in alpha-helix conformation and the number of contacts, as a 
function of the fraction of the total conformational changes. 
(The total conformational change is defined as the total Eu- 
clidean distance covered along the path: £„=i A l n .n+i-) 

Some comments on these results are in order. First of all 
we note that, in all simulations performed, the folding tran- 
sition occurs through two rather distinct regimes: in an early 
stage, involving the first ~ 80% of the total conformational 
changes, the paths are quite different from each other and no 
secondary structure is formed. The radius of gyration is de- 
creasing until about 60% of the reaction and then remains es- 
sentially constant. Correspondingly, the number of contacts is 
first increasing and then remains constant. These results sug- 
gest that the initial phase of the folding reaction consits of a 
collapse of the protein, which strongly depends on the initial 
coil configuration. Only in the last 20% of the conformational 
evolution, the protein is rearranging to give rise to secondary 
structures. This finding is in qualitative agreement with recent 
experiments on Villin folding kinetics |3J] in which the fluo- 
rescence quantum yield and frequency shift were investigated 
with laser temperature -jump. It was found that the unfolding 
kinetics could be fitted with a bi-exponential function, with 
time constants of 70 ns and 5 /j s. The 70 ns phase was inter- 
preted as related to the formation and melting of the helical 
turn connecting residues W23 and H27. 

We also note that in the last 20% of the reaction, all paths 
exihibit some degree of similarity. This is a natural conse- 
quence of the funneled structure of the energy landscape in 
our topology-based model. 

In conclusion, in the present work we have shown how 
the formal analogy between Langevin diffusion and quantum 
propagation can be exploited to perform efficient simulations 
of the entire protein folding transition. The framework devel- 
oped in this work is completely general, i.e. it does not rely 
on the particular choice of the relevant degrees of freedom nor 
on the structure of the interactions. Unlike other approaches 
based on a time-dependent description of the dynamics, the 
present approach does not suffer from limitations associated 
to rare events and therefore its applicability is not limited to 
very small proteins or fast-folders. A major improvement con- 
nected to the use of this approach is the significant reduction 
of the computer time necessary for the computation coming 
from the different treatment of the fluctuations which deter- 
mine the time scale of Newtonian dynamics. As a result of 
this simplification, within our approach simulating the entire 
folding process for a typical protein becomes feasible with 



available computers. 

Since the focus of the present work was on methodology 
rather than on phenomenology, we have performed our ex- 
ploratory numerical analysis using a coarse-grained topology- 
based model. We have shown that the approach is computa- 
tionally feasible and allows to access important information 
about the evolution of the different structures. We have found 
that, in such a simple model, the transition occurs through 
an initial collapsing phase driven by the starting coil config- 
uration and a later rearrangement phase, in which all com- 
puted paths display strong similarities. Simulations using 
more sophisticated all-atom models are in progress and will 
clarify whether these are general features or are biases of the 
topology-based model adopted in this work. 
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